Singapore is one of the preferred destinations to do business in Asia, with its economy ranked as the world’s most competitive economy, based on the latest 2019 World Economic Forum Global Competitiveness Report. Favorable factors contributing to Singapore’s economic performance include an open economy, strong labor-employer relations, diverse cosmopolitan workforce, as well as government stability and responsiveness to change.
Singapore also offers the best quality of life in Asia, based on Mercer’s 2019 Quality of Living Survey, which considers factors like political stability, healthcare, education, crime, recreation and transport. People have been attracted to this cosmopolitan island state, due to its vibrant economy, low personal income taxes, cultural diversity and high quality of living. Presently, the immigrant population in Singapore number 2.16 million, and makes up ~40% of the total population of ~5.7 million people.
However, cost of living is a concern, with Singapore being rated as the world’s most expensive city by the Economist Intelligence Unit’s 2018 Worldwide Cost of Living Report. Furthermore, Singapore has a large population size for its size, with ~8,000 people per km2. This makes Singapore 230 times denser than the United States, and more than 2,500 times denser than Australia.
As such, the goal of this analysis is to identify the most livable neighborhoods in Singapore for individuals looking to relocate to Singapore and those considering moving within Singapore. For the purpose of this exercise, we will define the most livable neighborhoods as having: (i) an affordable median rental price, (ii) a tolerable population density, (iii) a balanced mix of amenities in the neighborhood, and lastly (iv) a wide selection of good food options nearby.
This analysis will require the use of the following data sources:
Singapore Median Rent by Town and Flat Type
Data on Singapore towns and corresponding median rental prices by town and flat type will be retrieved from Data.gov.sg (https://data.gov.sg), the government’s one-stop access portal to publicly available datasets. Since Median Rent by Town and Flat Type data covers information from April 1, 2005 to December 31, 2019 on a quarterly basis, we will be using 2019-Q4 data for this analysis, as this is the most recent dataset. To simplify the analysis, the average rental price for each town will be determined by the median rental price for 4-room flat types in that town, as it is available as a benchmark across almost all towns.
Singapore Population Density by Town
Data on Singapore’s population density by town will be obtained by scraping data from the Wikipedia page on ‘Planning Areas of Singapore’ (https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore), which contains data on town name, region, area (km2), population and density (/km2). Population density (people per km2) is a measure of the degree of 'crowding' of the town and is calculated by dividing the town’s population by total area of town.
Singapore Town Location Data
Singapore’s geospatial data will be retrieved from Data.gov.sg (https://data.gov.sg). Master Plan 2019 Planning Area Boundary (No Sea) data provides indicative polygons of planning area boundary, and this GeoJSON data on Singapore’s planning areas will enable visualization on maps. In parallel, geographic coordinates of town centers will be retrieved using Google Maps, with coordinates of MRT stations being used as the center for all towns for the purpose of this analysis.
Singapore Venue Information from Foursquare API
Foursquare API (https://foursquare.com/) will be used to explore the neighborhoods of each town. Using Foursquare API, we will understand the various venues in each neighborhood, to assess if there are a balanced mix of amenities and to determine the most common venue categories. In addition, we will also be using Foursquare API to retrieve venue ratings for each location. However, as venue ratings are a premium endpoint, we are limited to only 50 premium calls per day with a Personal account on Foursquare API. In view of this constraint, we will limit the analysis of ratings to only food venues, since this is where ratings will likely matter more.
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import numpy as np # library to handle data in a vectorized manner
import json # library to handle JSON files
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
import seaborn as sns
%matplotlib inline
mpl.style.use('ggplot')
# import StandardScaler for normalizing data
from sklearn.preprocessing import StandardScaler
# import k-means fpr clustering stage
from sklearn.cluster import KMeans
!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library
print('Libraries imported.')
Extract data from zip file
import zipfile
!wget -q -O 'median-rent-by-town-and-flat-type.zip' "https://data.gov.sg/dataset/b35046dc-7428-4cff-968d-ef4c3e9e6c99/download"
zf = zipfile.ZipFile('./median-rent-by-town-and-flat-type.zip')
sg_median_rent_raw = pd.read_csv(zf.open("median-rent-by-town-and-flat-type.csv"))
sg_median_rent_raw.head()
Data clean-up
# Drop rows with rental price = 'na'.
sg_median_rent=sg_median_rent_raw[~sg_median_rent_raw['median_rent'].isin(['-','na'])]
sg_median_rent.head()
# Retain only 2019-Q4 data, as it is most current
sg_median_rent=sg_median_rent[sg_median_rent['quarter'] == "2019-Q4"]
sg_median_rent.head()
# Consider only 4-RM flat type, as it is available across almost all towns
sg_median_rent=sg_median_rent[sg_median_rent['flat_type'] == "4-RM"]
# Reset index, because multiple rows dropped
sg_median_rent.reset_index(drop=True, inplace=True)
sg_median_rent
# Replace CENTRAL to OUTRAM, KALLANG/WHAMPOA to KALLANG in 'town'
sg_median_rent['town']= sg_median_rent['town'].replace('CENTRAL', 'OUTRAM')
sg_median_rent['town']= sg_median_rent['town'].replace('KALLANG/WHAMPOA', 'KALLANG')
sg_median_rent
# Check data types
sg_median_rent.dtypes
# Convert median rent to float64
sg_median_rent['median_rent']=sg_median_rent['median_rent'].astype(np.float64)
sg_median_rent.head()
# drop columns 'quarter', 'flat_type' from 'sg_population_density' dataframe
sg_median_rent.drop(['quarter', 'flat_type'], axis = 1, inplace=True)
sg_median_rent.head()
# rename columns 'town' to 'Town', 'median_rent' to 'Median Rent (SGD/month)'
sg_median_rent.rename(columns={'town': 'Town', 'median_rent': 'Median Rent (SGD/month)'}, inplace=True)
sg_median_rent.head()
# sort by Median Rent
sg_median_rent_graph = sg_median_rent.sort_values('Median Rent (SGD/month)', ascending=True)
sg_median_rent_graph.head()
sg_median_rent_graph.shape
# plot horizontal bar chart for Median Rent by Town
sg_median_rent_graph.plot(kind='barh', figsize=(12,8), color='steelblue')
plt.xlabel('Median Rent (SGD/month)')
plt.yticks(range(25), sg_median_rent_graph['Town'])
plt.title('Median Rent for 4-Room Flat by Town in Singapore for Q4 2019')
plt.show()
# plot histogram for Median Rent
sg_median_rent_graph.plot(kind='hist', figsize=(12,8))
plt.xlabel('Median Rent (SGD/month)')
plt.ylabel('Number of Towns in Singapore')
plt.title('Histogram of Median Rent for 4-Room Flat in Singapore for Q4 2019')
plt.show()
Web scraping
# scrape table info from Wikipedia page and put into pandas dataframe
sg_population_density_raw = pd.read_html('https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore', header=0)[2]
sg_population_density_raw.head()
# Check data types
sg_population_density_raw.dtypes
Data clean-up & wrangling
# drop columns 'Malay', 'Chinese', 'Pinyin', 'Tamil' from 'sg_population_density_raw' dataframe
sg_population_density_raw.drop(['Malay', 'Chinese', 'Pinyin', 'Tamil'], axis = 1, inplace=True)
sg_population_density_raw.head()
# rename columns 'Name (English)' to 'Town', 'Population[7]' to 'Population'
sg_population_density_raw.rename(columns={'Name (English)': 'Town', 'Population[7]': 'Population'}, inplace=True)
sg_population_density_raw.head()
# create new dataframe 'sg_population_density'
sg_population_density = sg_population_density_raw
# drop columns 'Region', 'Area (km2)', 'Population' from 'sg_population_density' dataframe
sg_population_density.drop(['Region', 'Area (km2)', 'Population'], axis = 1, inplace=True)
# rename column 'Density (/km2)' to 'Population Density (people/km2)'
sg_population_density.rename(columns={'Density (/km2)': 'Population Density (people/km2)'}, inplace=True)
sg_population_density.head()
# capitalize values for 'Town'
sg_population_density['Town'] = sg_population_density['Town'].str.upper()
sg_population_density
# replace "*" to 0
sg_population_density.replace('*', 0, inplace = True)
sg_population_density
# Convert 'Population Density (people/km2)' to float64
sg_population_density['Population Density (people/km2)']=sg_population_density['Population Density (people/km2)'].astype(np.float64)
sg_population_density.head()
# sort by Population Density (people/km2)
sg_population_density_graph = sg_population_density.sort_values('Population Density (people/km2)', ascending=True)
sg_population_density_graph
sg_population_density_graph.shape
# plot horizontal bar chart for Population Density by Town
sg_population_density_graph.plot(kind='barh', figsize=(12,12), color='steelblue')
plt.xlabel('Population Density (people/km2)')
plt.yticks(range(55), sg_population_density_graph['Town'])
plt.title('Population Density by Town / Planning Area in Singapore')
plt.show()
# plot histogram for Population Density
sg_population_density_graph.plot(kind='hist', figsize=(12,8))
plt.xlabel('Population Density (people/km2)')
plt.ylabel('Number of Towns in Singapore')
plt.title('Histogram of Population Density for Towns / Planning Areas in Singapore')
plt.show()
# merge sg_median_rent and sg_population_density datasets on 'Town' column value
sg_corr_data = pd.merge(sg_median_rent, sg_population_density, on='Town')
sg_corr_data.head()
# plot scatter plot of Population Density and Median Rent
sns.regplot(x='Population Density (people/km2)', y='Median Rent (SGD/month)', data=sg_corr_data)
# examine correlation between Population Density and Median Rent
sg_corr_data[['Population Density (people/km2)', 'Median Rent (SGD/month)']].corr()
Population Density provides some predictive value for Median Rent, with correlation at approximately -0.450
Geographic coordinates of town centers retrieved using Google Maps, with coordinates of MRT stations being used as the center for all towns for the purpose of this analysis. Data from this exercise is saved in separate CSV file.
# Retrieve csv file for Town Center coordinates
sg_town_center = pd.read_csv('https://raw.githubusercontent.com/RaphaelO-SG/Coursera_Capstone/master/The%20Battle%20of%20Neighborhoods/MRT-station-coordinates.csv')
sg_town_center.head()
# rename columns 'town' to 'Town', 'lat' to 'Town Latitude', 'lng' to 'Town Longitude'
sg_town_center.rename(columns={'town': 'Town', 'lat': 'Town Latitude', 'lng': 'Town Longitude'}, inplace=True)
sg_town_center
# merge sg_town_center and sg_median_rent datasets on 'Town' column value
sg_town_data = pd.merge(sg_town_center, sg_median_rent, on='Town')
sg_town_data.head()
# merge sg_town_data and sg_population_density datasets on 'Town' column value
sg_town_data = pd.merge(sg_town_data, sg_population_density, on='Town')
sg_town_data
# get geographical coordinates for Singapore with geopy library
address = 'Singapore'
geolocator = Nominatim(user_agent="SG explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Singapore are {}, {}.'.format(latitude, longitude))
Generate map of Singapore with Towns and Median Rent (SGD / month) superimposed on top.
# create map of Singapore using latitude and longitude values
map_singapore = folium.Map(location=[latitude, longitude], zoom_start=12)
# add markers to map
for lat, lng, town, median_rent in zip(sg_town_data['Town Latitude'], sg_town_data['Town Longitude'], sg_town_data['Town'], sg_town_data['Median Rent (SGD/month)']):
label = '{}, SGD {} / month'.format(town, median_rent)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='blue',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_singapore)
map_singapore
# Retrieve csv file containing Planning Area names in GeoJSON file by Town
sg_planning_area = pd.read_csv('https://raw.githubusercontent.com/RaphaelO-SG/Coursera_Capstone/master/The%20Battle%20of%20Neighborhoods/singapore-2019-planning-area.csv')
sg_planning_area.head()
# rename column 'name' to 'Name', and 'town' to 'Town'
sg_planning_area.rename(columns={'name': 'Name', 'town': 'Town'}, inplace=True)
sg_planning_area.head()
# merge datasets on Town column value
sg_choropleth = pd.merge(sg_planning_area, sg_population_density, on='Town')
sg_choropleth.head()
# download Singapore geojson file
!wget --quiet https://github.com/RaphaelO-SG/Coursera_Capstone/raw/master/The%20Battle%20of%20Neighborhoods/master-plan-2019-planning-area-boundary-no-sea-geojson.geojson
print('GeoJSON file downloaded!')
sg_geo = r'master-plan-2019-planning-area-boundary-no-sea-geojson.geojson'
# create map of Singapore using latitude and longitude values
map_singapore_choropleth = folium.Map(location=[latitude, longitude], zoom_start=12)
# create a numpy array of length 6 and has linear spacing from the minimum to maximum population density
threshold_scale = np.linspace(sg_choropleth['Population Density (people/km2)'].min(),
sg_choropleth['Population Density (people/km2)'].max(),
6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 0.01 # make sure that the last value of the list is greater than the maximum population density
# add choropleth layer on map
map_singapore_choropleth.choropleth(
geo_data=sg_geo,
data=sg_choropleth,
columns=['Name', 'Population Density (people/km2)'],
key_on='feature.properties.Name',
threshold_scale=threshold_scale,
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Sigapore Population Density (people/km2) by Planning Area'
)
# add markers to map
for lat, lng, town, median_rent in zip(sg_town_data['Town Latitude'], sg_town_data['Town Longitude'], sg_town_data['Town'], sg_town_data['Median Rent (SGD/month)']):
label = '{}, SGD {} / month'.format(town, median_rent)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color='blue',
fill=True,
fill_color='#3186cc',
fill_opacity=0.7,
parse_html=False).add_to(map_singapore_choropleth)
# display map
map_singapore_choropleth
# @hidden_cell
CLIENT_ID = 'ADQ4BCC5UBXF1NXEMCBZPLUDSG5NYWQARLFQTG4CMTYQ35LU' # your Foursquare ID
CLIENT_SECRET = 'HWC4WHHMM3XE4UBBLCWRNLB43WAMNYSP4E0O2LJGAADOUFCW' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
print('Your credentials:')
print('CLIENT_ID: loaded')
print('CLIENT_SECRET: loaded')
Define funtion to get latitudes and longitudes for each town, and the venues within a radius of 500 meters.
def getNearbyVenues(names, latitudes, longitudes, radius=500, limit=100):
venues_list=[]
for name, lat, lng in zip(names, latitudes, longitudes):
print(name)
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
limit)
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
# return only relevant information for each nearby venue
venues_list.append([(
name,
lat,
lng,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng'],
v['venue']['categories'][0]['name']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = ['Town',
'Town Latitude',
'Town Longitude',
'Venue',
'Venue Latitude',
'Venue Longitude',
'Venue Category']
return(nearby_venues)
Run the above function on each town and create a new dataframe sg_venues
# call getNearbyVenues for each Neighborhood
sg_venues = getNearbyVenues(names=sg_town_data['Town'],
latitudes=sg_town_data['Town Latitude'],
longitudes=sg_town_data['Town Longitude']
)
Check size of resulting dataframe
print (sg_venues.shape)
sg_venues.head()
Check number of venues returned for each town
sg_venues.groupby('Town').count()
Number of unique categories curated from all the returned venues
print('There are {} uniques categories.'.format(len(sg_venues['Venue Category'].unique())))
# one hot encoding
sg_onehot = pd.get_dummies(sg_venues[['Venue Category']], prefix="", prefix_sep="")
# add Town column back to dataframe
sg_onehot['Town'] = sg_venues['Town']
# move Town column to the first column
sg_onehot.drop(labels=['Town'], axis=1,inplace = True)
sg_onehot.insert(loc=0, column='Town', value=sg_venues['Town'].to_list())
sg_onehot.head()
sg_onehot.shape
Group rows by Town and by taking total occurrences of each category
sg_grouped_total = sg_onehot.groupby('Town').sum().reset_index()
sg_grouped_total
sg_grouped_total.shape
# Understand which are the most common Venue Categories across all Towns
sg_venues_count = sg_venues.groupby('Venue Category').count()
sg_venues_count = sg_venues_count.sort_values('Venue', ascending=False).reset_index()
sg_venues_count.head()
# drop columns 'Town', 'Town Latitude', 'Town Longitude', 'Venue Latitude', 'Venue Longitude' from 'sg_venues_count' dataframe
sg_venues_count.drop(['Town', 'Town Latitude', 'Town Longitude', 'Venue Latitude', 'Venue Longitude'], axis = 1, inplace=True)
# rename column 'Venue' to 'Venue Count'
sg_venues_count.rename(columns={'Venue': 'Venue Count'}, inplace=True)
sg_venues_count
# get data for Top 40 Venue Categories
sg_venues_top40 = sg_venues_count.head(40)
sg_venues_top40
# sort by Venue Count
sg_venues_top40_graph = sg_venues_top40.sort_values('Venue Count', ascending=True)
sg_venues_top40_graph.head()
# plot horizontal bar chart for Top 40 Venue Categories
sg_venues_top40_graph.plot(kind='barh', figsize=(12,8), color='steelblue')
plt.xlabel('Venue Count')
plt.yticks(range(40), sg_venues_top40_graph['Venue Category'])
plt.title('Top 40 Venue Categories')
plt.show()
# select 10 key amenities to assess if there are a balanced mix of amenities in each town
sg_amenities = sg_grouped_total[['Coffee Shop', 'Food Court', 'Fast Food Restaurant', 'Café', 'Shopping Mall', 'Supermarket', 'Clothing Store', 'Bookstore', 'Convenience Store', 'Gym']]
sg_amenities.head()
# For 10 key amenities selected, use 0 to indicate absence, and 1 to indicate presence in each town
sg_amenities = sg_amenities.clip(upper=1)
sg_amenities.head()
# Determine how many of the 10 key amenities does each town have
sg_amenities['Total Amenities'] = sg_amenities.sum(axis=1)
sg_amenities.head()
# add Town column back to sg_amenities dataframe
sg_amenities['Town'] = sg_grouped_total['Town']
# move Town column to the first column
sg_amenities.drop(labels=['Town'], axis=1,inplace = True)
sg_amenities.insert(loc=0, column='Town', value=sg_grouped_total['Town'].to_list())
sg_amenities
# create dataframe with only Total Amenities
sg_amenities_total = sg_amenities[['Town', 'Total Amenities']]
sg_amenities_total.head()
# sort by Total Amenities
sg_amenities_graph = sg_amenities_total.sort_values('Total Amenities', ascending=True)
sg_amenities_graph
sg_amenities_graph.shape
# plot horizontal bar chart for Total Amenities by Town
sg_amenities_graph.plot(kind='barh', figsize=(12,8), color='steelblue')
plt.xlabel('Count of Key Venue Categories')
plt.yticks(range(25), sg_amenities_graph['Town'])
plt.title('Towns in Singapore with Key Venue Categories Required for Balanced Mix of Amenities')
plt.show()
# plot histogram for Population Density
sg_amenities_graph.plot(kind='hist', figsize=(12,8))
plt.xlabel('Count of Key Venue Categories')
plt.ylabel('Number of Towns in Singapore')
plt.title('Histogram of Towns in Singapore with Key Venue Categories Required for Balanced Mix of Amenities')
plt.show()
# merge sg_town_data and sg_amenities_total datasets on 'Town' column value
sg_town_amenities_data = pd.merge(sg_town_data, sg_amenities_total, on='Town')
sg_town_amenities_data
# plot scatter plot of Total Amenities and Median Rent
sns.regplot(x='Total Amenities', y='Median Rent (SGD/month)', data=sg_town_amenities_data)
# plot scatter plot of Total Amenities and Population Density
sns.regplot(x='Total Amenities', y='Population Density (people/km2)', data=sg_town_amenities_data)
# examine correlation between Total Amenities and Median Rent
sg_town_amenities_data[['Median Rent (SGD/month)','Population Density (people/km2)','Total Amenities']].corr()
Population Density provides some predictive value for Median Rent, with correlation at approximately -0.450.
Total Amenities provides fairly good predictive value for Median Rent, with correlation at approximately -0.653.
Population Density is however not a good predictor for Total Amenities, with correlation at approximately 0.364.
Remove Town Latitude and Town Longitude, as these are not applicable in the clustering analysis.
df = sg_town_amenities_data[['Town', 'Median Rent (SGD/month)', 'Population Density (people/km2)', 'Total Amenities']]
df.head()
Normalization is a statistical method that helps mathematical-based algorithms to interpret features with different magnitudes and distributions equally. StandardScaler() will be used to normalize the dataset.
X = df.values[:,1:]
Clus_dataSet = StandardScaler().fit_transform(X)
Clus_dataSet
For each k value, we will initialise k-means and use the inertia attribute to identify the sum of squared distances of samples to the nearest cluster centre
Sum_of_squared_distances = []
K = range(1,15)
for k in K:
km = KMeans(n_clusters=k)
km = km.fit(Clus_dataSet)
Sum_of_squared_distances.append(km.inertia_)
Plot sum of squared distances for k in the range specified above. If the plot looks like an arm, then the elbow on the arm is optimal k.
plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()
Based on Elbow Method, optimal k = 3
Run k-means to cluster the towns into 3 clusters
clusterNum = 3
k_means = KMeans(init = "k-means++", n_clusters = clusterNum, n_init = 12)
k_means.fit(X)
labels = k_means.labels_
print(labels)
Assign labels to each row in dataframe
sg_town_amenities_data['Clus_km'] = labels
sg_town_amenities_data['Clus_km'] = sg_town_amenities_data['Clus_km'] + 1
sg_town_amenities_data.head()
df['Clus_km'] = labels
df['Clus_km'] = df['Clus_km'] + 1
df.head()
Check centroid values by averaging features in each cluster
df.groupby('Clus_km').mean()
Examine distribution of towns based on clusters
df.groupby('Clus_km').describe().transpose()
sg_geo = r'master-plan-2019-planning-area-boundary-no-sea-geojson.geojson'
# create map of Singapore using latitude and longitude values
map_singapore_choropleth = folium.Map(location=[latitude, longitude], zoom_start=12)
# create a numpy array of length 6 and has linear spacing from the minimum to maximum population density
threshold_scale = np.linspace(sg_choropleth['Population Density (people/km2)'].min(),
sg_choropleth['Population Density (people/km2)'].max(),
6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 0.01 # make sure that the last value of the list is greater than the maximum population density
# add choropleth layer on map
map_singapore_choropleth.choropleth(
geo_data=sg_geo,
data=sg_choropleth,
columns=['Name', 'Population Density (people/km2)'],
key_on='feature.properties.Name',
threshold_scale=threshold_scale,
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Sigapore Population Density (people/km2) by Planning Area'
)
# set color scheme for the clusters
x = np.arange(clusterNum)
ys = [i + x + (i*x)**2 for i in range(clusterNum)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
# add markers to map
for lat, lng, town, median_rent, amenities, cluster in zip(sg_town_amenities_data['Town Latitude'], sg_town_amenities_data['Town Longitude'], sg_town_amenities_data['Town'], sg_town_amenities_data['Median Rent (SGD/month)'], sg_town_amenities_data['Total Amenities'], sg_town_amenities_data['Clus_km']):
label = '{}, SGD {} / month, {} Amenities, Cluster {}'.format(town, median_rent, amenities, cluster)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
[lat, lng],
radius=5,
popup=label,
color=rainbow[cluster-1],
fill=True,
fill_color=rainbow[cluster-1],
fill_opacity=0.7,
parse_html=False).add_to(map_singapore_choropleth)
# display map
map_singapore_choropleth
Cluster 1 (Purple) = High Population Density, Low Rent, Moderately Balanced Mix of Amenities
Cluster 2 (Green) = Low Population Density, Variable Rent, Variable Mix of Amenities
Cluster 3 (Red) = Moderate Population Density, Variable Rent, Most Balanced Mix of Amenities
df.sort_values('Clus_km', ascending=True).reset_index()